Genome-resolved metagenomics on anaerobic enrichments from deep subsurface groundwaters
Author
george.westmeijer@umu.se
Published
March 7, 2025
Read data and assign colors
The 16S ribosomal RNA data is from several sequencing projects. In chronological order, P15009 contains the groundwater samples from t0 (sampled Oct 2019), while P21015 contains the samples from t24 (sampled Feb 2020) from identical groundwaters. P21015 contains the sequencing libraries from the cultures as part of phase a (one meteoric groundwater), while P26201 contains the libraries from the cultures as part of phase b (three groundwaters as inoculum).
All 16S libraries were processed with the bioinformatic pipeline nf-core/ampliseq (v2.12.0), using the SBDI-GTDB (R09-RS220-1) as a reference.
This project includes 15 metagenomes from 15 unique cultures. The metagenomes contain 35 de-replicated metagenome-assembled genomes (MAG) that are affiliated with the Bacillota (7), Desulfobacterota (7), Pseudomonadota (6), Bacteroidota (4), Patescibacteria (3), Nanoarchaeota (2), Acidobacteriota (1), Campylobacterota (1), Chloroflexota (1), Iainarchaeota (1), Spirochaetota (1), and Verrucomicrobiota (1).
Figure 1: sampling-site
Expand code
coverm <-read_tsv("data/coverm.tsv.gz", col_types =cols(.default =col_character(), tpm =col_double()))emapper <-read_tsv("data/emapper.tsv.gz", show_col_types =FALSE)# Sample 35 (l7f) was removed from the binning process as this sample did not produce any binsbintax <-read_tsv("data/gtdbtk.summary.tsv", col_types =cols(.default =col_character())) quality <-read_tsv("data/quality_report.tsv", show_col_types = F) %>%rename_with(tolower) %>%rename(genome = name) %>%filter(genome %in% coverm$genome)
rare %>%inner_join(smd, by =c("site"="sample")) %>%# One sample has ~ 1 million reads filter(sample <150000) %>%ggplot(aes(sample, species, group = site, colour = origin)) +geom_line(linewidth =0.75) +scale_color_manual(values = cols.origin, name ="Groundwater", labels =c("Meteoric", "Marine", "Saline")) +labs(x ="No. of sequenced reads", y ="Rarefied No. of ASVs") +scale_x_continuous(labels =c("0", "50.000", "100.000", "150.000")) +theme_tidy() +theme(panel.border =element_rect(fill ="transparent", linewidth =0.75, colour ="#000000"), )
Figure 2: Rarefaction curves for the groundwater samples (t0) for each groundwater type (n = 6).
Expand code
ggsave(filename ="figures/rarecurve.pdf", width =100, height =80, units ="mm")
Overview community structure t0
To visualize the community composition, the most abundant phyla are identified while grouping less abundant phyla as “Other”. ASVs that are not identified on phylum level are included as “Unidentified”.
Expand code
t <- seqtab %>%filter(sample %in% t0) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%filter(!is.na(phylum)) %>%top_n(8, mean_relab)t %>%arrange(desc(mean_relab)) %>% knitr::kable()
p3 <- seqtab %>%filter(sample %in% t0) %>%inner_join(taxref, by ="seqid") %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, topphylum) %>%summarise(mrelab =sum(relab), .groups ='drop') %>%# Call the plotggplot(aes(x =fct_relevel(sample, t0), y = mrelab *100, fill =fct_relevel(topphylum, names(cols.phylum) %>%rev()) ) ) +labs(x ='', y ='Relative abundance (%)', fill ="") +geom_col() +scale_y_continuous(expand =c(0.01,0), limits =c(-10,101)) +annotate(geom ="segment", x =0.55, xend =3.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =3.55, xend =6.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =6.55, xend =9.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =9.55, xend =12.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =12.55, xend =15.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =15.55, xend =18.45, y =-1, yend =-1, linewidth =0.5) +# Labels datesannotate(geom ="text", x =2, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =5, y =-3.5, label ="April 2020", size =7/.pt) +annotate(geom ="text", x =8, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =11, y =-3.5, label ="April 2020", size =7/.pt) +annotate(geom ="text", x =14, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =17, y =-3.5, label ="April 2020", size =7/.pt) +# Labels groundwaterannotate(geom ="text", x =3.5, y =-8, label ="Meteoric", size =7/.pt) +annotate(geom ="text", x =9.5, y =-8, label ="Marine", size =7/.pt) +annotate(geom ="text", x =15.5, y =-8, label ="Saline", size =7/.pt) +scale_fill_manual(values = cols.phylum) +guides(fill =guide_legend(ncol =3, reverse =TRUE)) +theme_tidy(legend ="bottom") +theme(aspect.ratio =0.75,axis.text.x =element_blank(), axis.ticks.x =element_blank(),legend.key.height =unit(3.5, "mm"),legend.margin =margin(t =-20),legend.text =element_text(margin =margin(l =2)),panel.border =element_rect(fill ="transparent", linewidth =0.75),legend.key.spacing.y =unit(0, "mm") )p3
Figure 3: Community composition of the groundwater samples (t0) for each groundwater type (n = 6) before (October 2019) and after (April 2020) collection of groundwater for inoculation. The eight most abundant phyla are shown, grouping less abundant phyla as “Other” and uncharacterized ASVs as “Unidentified”.
As the hydrochemistry data is from a monitoring program, the sampling data closest to groundwater sampling was selected. All units are mg l-1, except for EC (mS m-1), pH, T (degrees C), and O18 (ppt SMOW)
The number of ASVs per sample are estimated with the specnumber() function from the Vegan library.
Expand code
adiv <- seqtab %>%select(-relab) %>%pivot_wider(names_from ="seqid", values_from ="count", values_fill =0) %>%column_to_rownames('sample') %>% vegan::specnumber() %>%as.data.frame() %>%rownames_to_column('sample') %>%#filter(sample %in% t0) %>%rename(specnumber =2)chem <- adiv %>%# Add the species richness for each sampleinner_join(chem, by ="sample")
In order to run the RDA, the absolute counts are standardized using a Hellinger transformation (square root of the relative abundance).
Expand code
hellinger <- seqtab %>%filter(sample %in% t0) %>%# Standard Vegan transformation: Turn table with with samples as *rows*select(sample, seqid, count) %>%pivot_wider(names_from ="seqid", values_from ="count", values_fill =0) %>%# Turn into a numeric matrixcolumn_to_rownames('sample') %>% vegan::decostand(method ="hellinger") %>%rownames_to_column("sample") %>%pivot_longer(-1, names_to ="seqid", values_to ="hellinger")m <- hellinger %>%# Create a matrixspread(seqid, hellinger) %>%column_to_rownames("sample")
The RDA is called while providing a large number of constraining factors, as the RDA function will by default not include redundant variables (Pearson r > 0.9).
Some constraints or conditions were aliased because they were redundant. This
can happen if terms are linearly dependent (collinear): 'ph_lab', 'ec_lab',
'temp_w', 'nh4_n', 'feii'
Expand code
# This one will contain the proportions explained which we get by dividing the# eigenvalue of each component with the sum of eigenvalues for all components.p <-c(rda$CCA$eig, rda$CA$eig)/sum(c(rda$CCA$eig, rda$CA$eig))# This one is collecting the coordinates for arrows that will depict how the # factors in our model point in the coordinatea <- rda$CCA$biplot %>%data.frame() %>% tibble::rownames_to_column('factor')lab <-tibble(factor =c("o18", "doc"),label =c("delta^18*O", "DOC")) %>%inner_join(a, by ="factor")summary(rda)
ggsave("figures/fig-1.pdf", width =18, height =10, units ="cm")
Network analysis: phylum
Expand code
t <-c("Spirochaetota", "Acidobacteriota", "Atribacterota", "Chloroflexota","Omnitrophota", "Nanoarchaeota", "Campylobacterota", "Pseudomonadota")network.survey <-read_tsv("data/network-survey.tsv", show_col_types = F)# Correlation between the abundance of the phyla using the culturesfit <-tibble("phylum"= t, "r.squared"=NA)for (i in t) { model <-lm(relab ~ cpr, data = network.survey %>%filter(phylum == i)) %>%summary() fit[fit$phylum == i, "r.squared"] <- model$adj.r.squared %>%round(digits =2)}fit
t <-c("Omnitrophota", "Chloroflexota", "Pseudomonadota", "Actinomycetota","Verrucomicrobiota", "Acidobacteriota", "Bacillota", "Spirochaetota")network <- seqtab %>%inner_join(smd, by ="sample") %>%filter(fraction =="non-fractionated") %>%inner_join(tax, by ="seqid") %>%filter(phylum %in% t) %>%group_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ="drop") network <- seqtab %>%inner_join(tax, by ="seqid") %>%filter(phylum =="Patescibacteria") %>%group_by(sample, phylum) %>%summarise(cpr =sum(relab), .groups ="drop") %>%select(-phylum) %>%inner_join(network, by ="sample")# Correlation between the abundance of the phyla using the culturesfit <-tibble("phylum"= t, "r.squared"=NA)for (i in t) { model <-lm(relab ~ cpr, data = network %>%filter(phylum == i)) %>%summary() fit[fit$phylum == i, "r.squared"] <- model$adj.r.squared %>%round(digits =2)}fit <- fit %>%arrange(desc(r.squared))fit
ggsave("figures/fig-s4.pdf", width =140, height =160, units ="mm")
16S data cultures
NMDS cultures three groundwaters (supplemental)
Expand code
set.seed(999)nmds <- seqtab %>%# Full size fractioninner_join(smd, by ="sample") %>%filter(!sample %in%paste("P21015_10", seq(42,81), sep ="")) %>%# Data from bachelor projectfilter(fraction =="non-fractionated"| fraction =="environmental") %>%group_by(seqid) %>%filter(count >2) %>%ungroup() %>%select(seqid, sample, count) %>%spread(seqid, count, fill =0) %>%column_to_rownames("sample") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.211602
Run 1 stress 0.2345592
Run 2 stress 0.2112576
... New best solution
... Procrustes: rmse 0.008926961 max resid 0.03646891
Run 3 stress 0.212245
Run 4 stress 0.2286573
Run 5 stress 0.211117
... New best solution
... Procrustes: rmse 0.00353371 max resid 0.02077429
Run 6 stress 0.2128131
Run 7 stress 0.2291371
Run 8 stress 0.211117
... New best solution
... Procrustes: rmse 1.543506e-05 max resid 8.833604e-05
... Similar to previous best
Run 9 stress 0.4051623
Run 10 stress 0.2423527
Run 11 stress 0.2308933
Run 12 stress 0.212245
Run 13 stress 0.2461051
Run 14 stress 0.2462855
Run 15 stress 0.2210886
Run 16 stress 0.2335761
Run 17 stress 0.2339457
Run 18 stress 0.211117
... Procrustes: rmse 1.554376e-05 max resid 6.999867e-05
... Similar to previous best
Run 19 stress 0.2291882
Run 20 stress 0.2318306
*** Best solution repeated 2 times
Expand code
vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample") -> nmds.scores
nmds <- seqtab %>%# Small size fractioninner_join(smd, by ="sample") %>%filter(!sample %in%paste("P21015_10", seq(42,81), sep ="")) %>%# Data from bachelor projectfilter(fraction =="0.1 - 0.45 µm"| fraction =="environmental") %>%group_by(seqid) %>%filter(count >2) %>%ungroup() %>%select(seqid, sample, count) %>%spread(seqid, count, fill =0) %>%column_to_rownames("sample") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1068401
Run 1 stress 0.10684
... New best solution
... Procrustes: rmse 2.657211e-05 max resid 9.457025e-05
... Similar to previous best
Run 2 stress 0.10684
... Procrustes: rmse 5.550912e-05 max resid 0.0002287036
... Similar to previous best
Run 3 stress 0.10684
... New best solution
... Procrustes: rmse 4.668161e-06 max resid 1.871186e-05
... Similar to previous best
Run 4 stress 0.1068401
... Procrustes: rmse 8.694064e-05 max resid 0.0003545082
... Similar to previous best
Run 5 stress 0.10684
... Procrustes: rmse 6.414762e-05 max resid 0.0002643024
... Similar to previous best
Run 6 stress 0.1068401
... Procrustes: rmse 0.000154515 max resid 0.0006358123
... Similar to previous best
Run 7 stress 0.10684
... Procrustes: rmse 6.989599e-05 max resid 0.0002879014
... Similar to previous best
Run 8 stress 0.1068401
... Procrustes: rmse 0.0001356496 max resid 0.000558505
... Similar to previous best
Run 9 stress 0.10684
... Procrustes: rmse 0.0001036085 max resid 0.0004268339
... Similar to previous best
Run 10 stress 0.10684
... New best solution
... Procrustes: rmse 1.60265e-05 max resid 6.549052e-05
... Similar to previous best
Run 11 stress 0.10684
... Procrustes: rmse 3.802136e-05 max resid 0.0001549548
... Similar to previous best
Run 12 stress 0.10684
... Procrustes: rmse 0.0001040217 max resid 0.0004264123
... Similar to previous best
Run 13 stress 0.10684
... Procrustes: rmse 7.764992e-05 max resid 0.0003155699
... Similar to previous best
Run 14 stress 0.10684
... Procrustes: rmse 3.176487e-05 max resid 0.0001310893
... Similar to previous best
Run 15 stress 0.1068401
... Procrustes: rmse 0.0001395685 max resid 0.0005709902
... Similar to previous best
Run 16 stress 0.10684
... Procrustes: rmse 8.264614e-05 max resid 0.0003352701
... Similar to previous best
Run 17 stress 0.10684
... Procrustes: rmse 3.730486e-05 max resid 0.0001529986
... Similar to previous best
Run 18 stress 0.1068401
... Procrustes: rmse 9.747056e-05 max resid 0.0003963954
... Similar to previous best
Run 19 stress 0.10684
... Procrustes: rmse 4.60456e-05 max resid 0.0001755907
... Similar to previous best
Run 20 stress 0.10684
... Procrustes: rmse 7.772982e-05 max resid 0.0003197417
... Similar to previous best
*** Best solution repeated 11 times
Expand code
vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample") -> nmds.scores
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1203684
Run 1 stress 0.1448066
Run 2 stress 0.1374292
Run 3 stress 0.1212802
Run 4 stress 0.1331208
Run 5 stress 0.1230642
Run 6 stress 0.13485
Run 7 stress 0.14842
Run 8 stress 0.133246
Run 9 stress 0.1408211
Run 10 stress 0.1212186
Run 11 stress 0.1440819
Run 12 stress 0.1336567
Run 13 stress 0.1400868
Run 14 stress 0.1447862
Run 15 stress 0.1504146
Run 16 stress 0.1212186
Run 17 stress 0.1342165
Run 18 stress 0.1305863
Run 19 stress 0.133246
Run 20 stress 0.1476626
*** Best solution was not repeated -- monoMDS stopping criteria:
14: stress ratio > sratmax
6: scale factor of the gradient < sfgrmin
Expand code
nmds.scores <- vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample")
Expand code
# 'i' contains the samples that will be included in panel ci <-c("P21015_1045", "P21015_1049", "P21015_1053", "P21015_1065", "P21015_1069", "P21015_1077", # Lysate"P21015_1043", "P21015_1047", "P21015_1051", "P21015_1055", "P21015_1075", "P21015_1079"# Acetate)i <- smd %>%filter(grepl("P21", sample)) %>%filter(fraction =="fractionated") %>%arrange(medium) %>%pull(sample)p2 <- nmds.scores %>%mutate(medium =factor(medium, levels =c("ace","lys","env"))) %>%ggplot(aes(x = NMDS1, y = NMDS2, color = fraction, shape = medium,fill = fraction ) ) +geom_vline(xintercept =0, linetype ='dotted') +geom_hline(yintercept =0, linetype ='dotted') +geom_point(size =4.0, stroke =0.75) +geom_text(data = nmds.scores, aes(label = age), size =6/.pt, color ="black", na.rm =TRUE) +scale_shape_manual(values =c("ace"=23, "lys"=24, "env"=21), labels =c("Acetate", "Lysate", "Environmental")) +scale_fill_manual(values =alpha(cols.fraction, alpha =0.2), labels =c("Environmental", "< 0.45 µm", "All cells")) +scale_color_manual(values = cols.fraction, labels =c("Environmental", "0.1 - 0.45 µm", "All cells")) +annotate('text', x =Inf, y =-Inf, size =6/.pt, hjust =1.05, vjust =-0.5, label =paste('Stress = ', round(nmds$stress, digits =2))) +theme_tidy(legend ="bottom", fontsize =6) +theme(legend.title =element_blank(),legend.box ="vertical",legend.spacing.x =unit(0, "mm"),legend.box.margin =margin(),legend.margin =margin(),legend.text =element_text(margin =margin(l =2, r =8, unit ="pt")),legend.key.spacing.x =unit(0, "mm") )
Expand code
cols.phylum <-c("Pseudomonadota"="#972D15","Spirochaetota"="#D8B70A","Patescibacteria"="#C7B19C","Nanoarchaeota"="#046C9A","Omnitrophota"="#00A08A","Desulfobacterota"="#A2A475","Acidobacteriota"="#FAEFD1" ) p3 <- seqtab %>%filter(sample %in% i) %>%inner_join(tax, by ="seqid") %>%filter(phylum %in%names(cols.phylum)) %>%group_by(sample, phylum) %>%summarise(n =sum(relab) *100, .groups ="drop") %>%# Order the samples and the phylamutate(phylum =factor(phylum, levels =names(cols.phylum))) %>%mutate(sample =factor(sample, levels = i)) %>%ggplot(aes(x = sample,y = n,fill =fct_rev(phylum) )) +geom_col() +scale_fill_manual(values = cols.phylum) +scale_y_continuous(limits =c(NA, 100)) +geom_text(data = smd %>%filter(sample %in% i) %>%mutate(sample =factor(sample, i)),aes(x = sample, y =3, label = age), size =6/.pt, inherit.aes =FALSE ) +labs(x ="", y ="Relative abundance (%)") +theme_tidy(legend ="bottom", fontsize =6) +guides(fill =guide_legend(ncol =2, reverse =TRUE)) +annotate("segment", x =0.5, xend =10.45, y =-2, linewidth =0.5) +annotate("text", x =5.5, y =-5, label ="Acetate", size =6/.pt) +annotate("segment", x =10.55, xend =20.45, y =-2, linewidth =0.5) +annotate("text", x =15.5, y =-5, label ="Lysate", size =6/.pt) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.title =element_blank(),legend.key.width =unit(3.5, "mm"), legend.key.height =unit(4.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.key.spacing.y =unit(0.2, "mm"),legend.margin =margin(t =-15) )
ggsave(filename ="figures/fig-2.pdf", width =14, height =14, units ="cm")
Barplot KR0015
Expand code
t <- seqtab %>%filter(sample %in%paste("P21015", seq(1042,1081), sep ="_") ) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%filter(!is.na(phylum)) %>%top_n(9, mean_relab) %>%filter(phylum !="Bacteroidota")# The most abundant phyla, averaged over all culturest %>%arrange(desc(mean_relab))
seqtab %>%filter(sample %in%paste("P21015", seq(1042,1081), sep ="_") ) %>%inner_join(taxref, by ="seqid") %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, topphylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%mutate(age =as.numeric(.$age)) %>%mutate(fraction =case_when( fraction =="non-fractionated"& medium =="ace"~"Acetate, all cells", fraction =="non-fractionated"& medium =="lys"~"Lysate, all cells", fraction =="fractionated"& medium =="ace"~"Acetate, < 0.45 µm fraction", fraction =="fractionated"& medium =="lys"~"Lysate, < 0.45 µm fraction" )) %>%mutate(topphylum =factor(topphylum, levels =rev(names(cols.phylum)))) %>%# Call the plotggplot(aes( age, relab *100, fill = topphylum)) +geom_col() +scale_x_continuous(n.breaks =10) +facet_wrap(~ fraction, nrow =1) +labs(x ='Week', y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum) +guides(fill =guide_legend(reverse =TRUE, ncol =3)) +theme_tidy(legend ="bottom") +theme(aspect.ratio =1.2,panel.border =element_rect(fill ="transparent", linewidth =0.5),strip.background =element_blank(), legend.background =element_blank(),legend.key.height =unit(4, "mm"),legend.key.spacing.y =unit(0, "mm"),plot.margin =margin(),axis.ticks.length =unit(0.5, "mm"),legend.box.margin =margin(t =-5) )
Expand code
ggsave("figures/fig-s3.pdf", height =9, width =18, units ="cm")
Barplot KR0015 - SA1420 - SA2600
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="KR0015") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
p1 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="KR0015") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="srm"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="srm"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA1420") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
Expand code
p2 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA1420") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="srm"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="srm"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA2600") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
Expand code
p3 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA2600") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="srm"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="srm"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
lm(coding_density ~ genome_size, data = quality) %>%summary()
Call:
lm(formula = coding_density ~ genome_size, data = quality)
Residuals:
Min 1Q Median 3Q Max
-0.05556 -0.01303 -0.00046 0.01275 0.03787
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.172e-01 8.112e-03 113.064 < 2e-16 ***
genome_size -6.991e-09 2.275e-09 -3.072 0.00424 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02103 on 33 degrees of freedom
Multiple R-squared: 0.2224, Adjusted R-squared: 0.1989
F-statistic: 9.439 on 1 and 33 DF, p-value: 0.004236
Expand code
p3 <- i %>%ggplot(aes( genome.size, coding_density, color = phylum,fill = phylum )) +geom_smooth(method ="lm", formula = y ~ x, colour ="black", fill ="#d9d9d9", se =TRUE, linewidth =0.4) +geom_point(size =2.5, shape =21, stroke =0.5) +labs(x ="Estimated genome size (Mb)", y ="Estimated coding density", color ="") +scale_colour_manual(values = cols.mag, guide ="none") +scale_fill_manual(values =alpha(cols.mag, alpha =0.5), guide ="none") +theme_tidy() +scale_x_continuous(breaks =c(1,2,3,4,5,6)) +scale_y_continuous(n.breaks =7) +annotate("text", x =Inf, y =Inf, label ="paste(R ^ 2, \" = 0.20\")", size =7/.pt, vjust =1, hjust =1, parse =TRUE) +theme(axis.line =element_line(),panel.border =element_blank() )
Expand code
nmds <- coverm %>%# Sum the abundance of both coverage per metagenomemutate(abundance =1) %>%inner_join(emapper, by ="genome", relationship ="many-to-many") %>%# Filter out NA and add multiple KOs on new linefilter(kegg_ko !="-") %>%separate_rows(kegg_ko, sep =",") %>%# Sum the abundance of each KO within the metagenomesgroup_by(genome, kegg_ko) %>%summarise(abundance =sum(abundance), .groups ="drop") %>%# Pivot to prepare a numerical matrixpivot_wider(names_from ="kegg_ko", values_from ="abundance", values_fill =0) %>%column_to_rownames("genome") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.06277184
Run 1 stress 0.06384113
Run 2 stress 0.06856721
Run 3 stress 0.06286035
... Procrustes: rmse 0.07810586 max resid 0.1485213
Run 4 stress 0.06368732
Run 5 stress 0.06286033
... Procrustes: rmse 0.07811342 max resid 0.1485397
Run 6 stress 0.06362823
Run 7 stress 0.06902555
Run 8 stress 0.06610897
Run 9 stress 0.0679945
Run 10 stress 0.06384089
Run 11 stress 0.06299223
... Procrustes: rmse 0.07837556 max resid 0.1496081
Run 12 stress 0.06299208
... Procrustes: rmse 0.07823435 max resid 0.1492526
Run 13 stress 0.06386464
Run 14 stress 0.07029334
Run 15 stress 0.06796592
Run 16 stress 0.06500024
Run 17 stress 0.1937571
Run 18 stress 0.06459237
Run 19 stress 0.06660911
Run 20 stress 0.06299231
... Procrustes: rmse 0.0783978 max resid 0.1496653
*** Best solution was not repeated -- monoMDS stopping criteria:
2: no. of iterations >= maxit
18: stress ratio > sratmax